Project 1.

This data is about basketball players from the year 2008 and is in the file ppg2008.csv. It has various statistics on players in the NBA. You might not know what each of the metrics means (I don’t), but they are just different dimensions of data.

This is a visualization and data mining exercise.

What can you say about this dataset, use tools that you learned here. and make a report or a visual that highlights something interesting, maybe compare players, especially how they have performed since, based on the data in here. Many of these players have reached their peak recently and you will be able to find statistics about their performance in 2019.

Could you have predicted the successes and failures of some of the players, based on analyses of the data ? maybe you could be a talent scout for an NBA team ?

Think of it as your job, as a reporter for NY times, to make a single graphic that highlights something about this data. Explain the analysis that went into the graphic and present the code too. This should be done in a notebook so it is easy to evaluate.

#Load packages
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
library(reshape2)
## Warning: package 'reshape2' was built under R version 4.1.3
library(plyr)
## Warning: package 'plyr' was built under R version 4.1.3
library(scales)
## Warning: package 'scales' was built under R version 4.1.3
library(ComplexHeatmap)
## Loading required package: grid
## ========================================
## ComplexHeatmap version 2.10.0
## Bioconductor page: http://bioconductor.org/packages/ComplexHeatmap/
## Github page: https://github.com/jokergoo/ComplexHeatmap
## Documentation: http://jokergoo.github.io/ComplexHeatmap-reference
## 
## If you use it in published research, please cite:
## Gu, Z. Complex heatmaps reveal patterns and correlations in multidimensional 
##   genomic data. Bioinformatics 2016.
## 
## The new InteractiveComplexHeatmap package can directly export static 
## complex heatmaps into an interactive Shiny app with zero effort. Have a try!
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(ComplexHeatmap))
## ========================================
library(circlize)
## Warning: package 'circlize' was built under R version 4.1.3
## ========================================
## circlize version 0.4.15
## CRAN page: https://cran.r-project.org/package=circlize
## Github page: https://github.com/jokergoo/circlize
## Documentation: https://jokergoo.github.io/circlize_book/book/
## 
## If you use it in published research, please cite:
## Gu, Z. circlize implements and enhances circular visualization
##   in R. Bioinformatics 2014.
## 
## This message can be suppressed by:
##   suppressPackageStartupMessages(library(circlize))
## ========================================
library(pheatmap)
## Warning: package 'pheatmap' was built under R version 4.1.3
## 
## Attaching package: 'pheatmap'
## The following object is masked from 'package:ComplexHeatmap':
## 
##     pheatmap
library(heatmaply)
## Warning: package 'heatmaply' was built under R version 4.1.3
## Loading required package: plotly
## Warning: package 'plotly' was built under R version 4.1.3
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ComplexHeatmap':
## 
##     add_heatmap
## The following objects are masked from 'package:plyr':
## 
##     arrange, mutate, rename, summarise
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## Loading required package: viridis
## Warning: package 'viridis' was built under R version 4.1.3
## Loading required package: viridisLite
## Warning: package 'viridisLite' was built under R version 4.1.3
## 
## Attaching package: 'viridis'
## The following object is masked from 'package:scales':
## 
##     viridis_pal
## 
## ======================
## Welcome to heatmaply version 1.4.0
## 
## Type citation('heatmaply') for how to cite the package.
## Type ?heatmaply for the main documentation.
## 
## The github page is: https://github.com/talgalili/heatmaply/
## Please submit your suggestions and bug-reports at: https://github.com/talgalili/heatmaply/issues
## You may ask questions at stackoverflow, use the r and heatmaply tags: 
##   https://stackoverflow.com/questions/tagged/heatmaply
## ======================
library(tinytex)
## Warning: package 'tinytex' was built under R version 4.1.3
#Read in original 2008 data set
nba08<-read.csv("E:/Stats.FINAL_PROJECT/ppg2008.csv",header=TRUE)
nba08$Name<-with(nba08,reorder(Name,PTS))
nba08.m<-melt(nba08)
## Using Name as id variables
nba08.m<-ddply(nba08.m, .(variable), transform, rescale=rescale(value))
#Set the ggplot up to create the heatmap for the 2008-2009 NBA data
p<-ggplot(nba08.m, aes(variable,Name))+geom_tile(aes(fill=rescale),colour="white")+scale_fill_gradient(low="white",high="steelblue")
p

Now, a heat map will be generated for the 2018 data.

Data taken from an NBA statistics website; cleaned up to remove some columns that cannot be compared https://www.nbastuffer.com/2018-2019-nba-player-stats/

#Read in 2018-19 data set
nba18<-read.csv("E:/Stats.FINAL_PROJECT/2018_2019NBAPlayerStatsRegSeason.csv",header=TRUE, fileEncoding="UTF-8-BOM")

#Remove duplciate names in this dataframe
nba18<-nba18[!duplicated(nba18$NAME),]

#Fix column1 so it becomes the row names
nba18.1<-nba18[,-1]
rownames(nba18.1)<-nba18[,1]

#nba18.1$NAME<-with(nba18.1,reorder(NAME,PTS))
#nba18.1.m<-melt(nba18.1)
#2018 data in same style as the 2008 data

nba18sorted<-nba18[order(-nba18$PTS),]
nba18.top50<-head(nba18sorted,50)
nba18.melted<-melt(nba18.top50)
## Using NAME as id variables
nba18.melted<-ddply(nba18.melted, .(variable),transform,rescale=rescale(value))
ggplot(nba18.melted,aes(variable,NAME))+geom_tile(aes(fill=rescale),colour="white")+scale_fill_gradient(low="white",high="steel blue")

NOW WITH FULL DATASET, Just to take a look.

#Now with all data
nba18.allplayers.melted<-melt(nba18sorted)
## Using NAME as id variables
nba18.allplayers.melted<-ddply(nba18.allplayers.melted, .(variable),transform,rescale=rescale(value))

ggplot(nba18.allplayers.melted,aes(variable,NAME))+geom_tile(aes(fill=rescale),colour="white")+scale_fill_gradient(low="white",high="steel blue")

Now Regenerate 08 heatmap WITHOUT Yao, who is a 3P% outlier.

#Read in original 2008 data set with Yao Ming removed
nba08noYao<-read.csv("E:/Stats.FINAL_PROJECT/ppg2008.noYao.csv",header=TRUE)
nba08noYao$Name<-with(nba08noYao,reorder(Name,PTS))
nba08noYao.m<-melt(nba08noYao)
## Using Name as id variables
nba08noYao.m<-ddply(nba08noYao.m, .(variable), transform, rescale=rescale(value))
#Set the ggplot up to create the heatmap for the 2008-2009 NBA data without Yao Ming
pnoYao<-ggplot(nba08noYao.m, aes(variable,Name))+geom_tile(aes(fill=rescale),colour="white")+scale_fill_gradient(low="white",high="steelblue")
pnoYao

DISCUSSION:

– At first glance in the 2008 players, 3-point shooting seems to be much less prioritized, with the rest of the players having lower but mostly equal shooting percentages for this statistic. However, in the 2018 data, we see that while 3P% is at mostly consistent levels across players, they are grouped much higher (darker colors) compared to the older data. We can see the main excellent shooters in 2018, unsurprisingly Steph Curry, Klay Thompson, and Danilo Gallinari. HOWEVER, it should be noted that it may be difficult to glean information as effectively from the 2008 data, as Yao Ming apparently made 100% of his 3pt shots (virtually no attempts, as he was an extremely tall player, even for a center, which usually did not shoot at the time). - After further investigation, it seems the issue extends beyond just Yao Ming. It seems that many large players in the Center position have extremely high or low 3-point shooting percentages due to their very low attempt numbers. However, removing Yao as an outlier still improves visualization of 3-point shooting percentages for the rest of the 2008 data set.

– This is why the original 08 data was remapped with Yao Ming removed to better visualize the relative 3PP performance of the top players of the time. In the 08 dataset, we can see that amongst top players in the league, 3PP is at a similar high level across the board, with a few standout players. This would indicate that being a good shooter, especially at the 3-point line, was an important part of a primary scorer’s game. This is further corroborated by many players also having a high free throw % (FTP). There are a few data points to specifically note, particularly the very dark blue and white spots on the heatmap. These extremely high and low 3PP stats reflect the playstyle of the most successful power forwards and centers of the league, that took few to no 3 point shots, causing that statline to show either extremely high or low percentages. These players include Pau Gasol and Yao Ming with extremely high percentages, and Shaquille O’neal, Tim Duncan, and Dwight Howard with extremely low percentages. These players typically found their scoring success with high-percentage shots inside of the 3-point line using their size and strength, which is supported by many of these players also having extremely high Field Goal Percentages (FGP). Furthermore, these players were successful due to other aspects of the game than scoring, indicated by many players with extremely low 3PP also having unusually high Rebounding and Block statistics.

– We can also discuss certain players that display deeply shaded blocks on the heatmap, indicating their unusually high performance in certain areas of the game. One such player is Chris Paul, who is close to the top of the Name axis and has very dark blocks in the Assist and Steal categories. This indicates his role as a crafty and strong playmaker, finding success not just as a scorer by himself, but making plays happen on both sides of the court. Another notable player is Dwight Howard, who notably has virtually no value in the deep shooting categories. However, it is clear he found success in other ways, shown by his incredibly high value in all rebounding categories, as well as in blocking and free throw attempts. This indicates his success as a defensive player and with plays at the rim earning him many free throws from other players fouling him in the act of scoring. – Other players that stand out include Kevin Martin, who is appears to be a good shooter in general, but was the best in the league at shooting free throws. Another is Deron Williams, with assists at about the same level as Chris Paul, and Stephen Jackson, with the most turnovers in the league. Lastly, Corey Maggette has the most Personal Fouls called against him by far, which upon further investigation, reflects his ability/style to draw fouls while scoring and create points from his solid free throw shooting.

2018

– Conclusions: - The most successful players in the league can most simply be identified by ranking players by how many points they made. However, this is not a complete picture, and players found this success in different specific aspects of the game. From looking at both 2008 and 2018 data, we can see that these players were able to find opportunities to have impact in their teams typically from either being excellent shooters, defenders(blocks and rebounds), playmakers, and/or insider scorers. Furthermore, we can see that typically, excelling in any particular category does seem to have a link to performing better in certain areas and worse in others, as plays tend to follow a particular playstyle based off of their speed, size, and ability to score, often at the cost of other areas of the game.

Dendrograms for the 2008 data without Yao Ming and the top 50 scoring players in 2018 can also be found below.

heatmaply(nba08noYao)
heatmaply(nba18.top50)